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and Mr. W. J. Dejka, NOSC, Code 8302. The work was performed by the author 
at the Naval Postgraduate School, Monterey, California. 
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EXPERIMENTS WITH VOICE INPUT FOR COMMAND AND CONTROL: 



USING VOICE INPUT TO OPERATE A DISTRIBUTED COMPUTER NETWORK 
I. EXECUTIVE SUMMARY 

This paper describes an experiment in which military officers used 
voice recognition equipment to verbally enter commands to the ARPANET, a 
large distributed network of ('omputers which are geographically located 
around the United States and other countries. 

The objective was to determine if it was at ail feasible to operate 
this network using commercially available state-of-the-art voice input 
equipment, and to compare this mode of entry with the normal manual typing 
input method. 

Twenty-four military officers who already knew how to operate the 
ARPANET participated in the experiment. They were initially introduced to 
the voice equipment and then allowed to practice with it over a period of a 
few days until they felt ’’comfortable” with it. They had previously used 
the ARPANET for hundreds of hours using manual typing input so the amount 
of time they spent practicing with the voice equipment was a subjective 
feeling on their part as to when they were comfortable with it. The average 
subject practiced for 3.26 hours with the voice recognition equipment and 
then told the experimenter he/she was ready to participate in the experiment. 

The experiment was then scheduled for an evening or weekend when the 
load average was under 3 on the host computers to insure fast network 
response times. 

In the experiment, subjects follc»wed a fixed scenario of instructions 
in which they accessed the ARPANET, logged into different host computers. 
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rtad nressages, sent messages, checked for new mail, read files, transferred 
Mlcs between host computers, deleted files, and Interconnected host com- 
piters. E<ich subject performed this scenario four times with either voice 
input first or typing input first, and then performed it four times with the 
ottier method of input. The scenario was designed to take about 10 minutes 
i • perform, but the actual performance times ranged from 6 to 18 minutes, 
in order to measure any free time the subjects had while carrying out the 
. funirio, a sec ondarv task was included in which they transcribed information, 
r . r-iiid, from ^ivil aviation weather reports onto a data sheet. Their main 
■ thtrefore, was to run the ARPANET according to the scenario, but during 

Jii free time they were to transcribe the aviation weather data. 

KetMiHg in mind that tlie average subject used the voice input method 
litt;*- )vcr 3 hours beiore doing the experiment, the results are quite 
i - V i f i < ' a 1 . t . 

Tile results, averaged across all trials of the experiment, show: 

' ' 17.3/^ faster than manual typing input. 

J.) typing input had 18 3.2% m ore entry errors . 

3; Voi ce in put allowed subjects to transcribe 25.0% more aviation weather 
Infonni^ion tiian during manual input. 

results are all statistically significant (p < .05) and suggest 
'{ •. i . Mile to use current (1979) commercially available voice recognition 

-'•^lament to run many standard operations of an ARPANET type network. 

In an era when so much is said and written about declining productivity in 
Pt--. r; , voire input technology may be one solution to helping reverse this trend 
•\ ve observed here, that with minimal practice , the job was done 17.5% 

' -nd at the same time, 25.0% more was done on another task. 

vVlicl -.ould c.ippen it experienced voice input subjects were used? 
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II. INTRODUCTION 



This paper describes an experiment in whicli a Threshold Technology, Inc., 
Model T600 discrete utteranc'e voice recognition system was used to command 
the running and operation of the ARPANET. 

III. OBJECTIVE 

The objective of this experiment was to determine if it was at all feas- 
ible to operate a distributed computer network using voice input. The 
ARPANET was used in an unclassified nu)de to simulate the types of commands 
and operations used in and between military command centers. The ARPANET 
technology is the basis for the Advanced Command and Control Architectural 
Testbed (ACCAT) which is a classified subnet of the ARPANET on which several 
command centers are linked together for the purposes of testing and examin- 
ing new software and hardware ideas applicable to command and control. Com- 
mand centers on this network are located at installations such as the Naval 
Postgraduate School (NPS) in Monterey, California, the Naval Ocean Systems 
Center (NOSC) in San Diego, California, and CtNCPACFLT in Hawaii. 

Future voice input experiments will be run on this classified network in 
addition to the unclassified ARPANET. 

IV. SUBJECTS 

Twenty-four subjects participated on a volunteer basis with no monetary 
or other incentive. They included 23 male military offic*ers from the Army, 
Navy, Air Force ^nid Marine Cor|)s, and one civilian female from the National 
Security Agency, Nineteen were enrolled in the Command and (.ontrol curricu- 
lum at NPS, 2 were enrolled in the intelligence cairriculum <it NPS, and 3 were 
military staff members at NPS. Ex[)crience levels in the military ranged from 
Lieutenant to Commander and from Captain to Lieutenant C!olonei. 
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All suDjects were experienced in using the ARPANET with manual typing 
input from a keyboard. 

None of the subjects had ever used voice recognition equipment and only 
MUfc had ever seen such equipment used. 

INITIAL TRAINING AND EQUIPMENT USED 

Subjects individually met with the experimenter initially and were given 
a subjective questionnaire regarding their opinions about using voice input 
< manual typing input. At this time, they were also given a typing 

ii’ i 1 i ty test . 

They were then told about the basic ideas of how the voice recognition 
- jaipment worked so it could recognize what they would say and were also 
sh ‘wn how we would be training the equipment for recognition. 

The Model T600 Threshold Technology, Inc. voice recognition unit had 
se\/eral added memory modules which allowed up to 256 two-second voice utter- 
..nc'cs to be used. In this experiment, 180 of the possible 256 utterances 
:an utterance is any continuously spoken pattern of speech up to 2 seconds 
' >ng , or as short as . 1 of a second) were actually entered into the voice 
reogniclon unit although only about 75 utterances were actually needed in 
t ♦ L \] i- r i ,.L nt . Tile maximum length of two seconds for any utterance is a 
'•rLtation imposed by the manufacturer, 

li voice recognition unit also c'ontained a magnetic tape cartridge 
.'ii: wliuh allowed the experimenter to record individual subject’s voice 
I .•terns and ARFANEI c'ommands after the subject trained the machine initially. 

tuMi, when I he subjec't c'.ame bac k to use the equipment at later times, the 
ti-agnctic tape cartridge was simply read back into memory and the subject was 
c adv to give voice iii|)ut commands. (This is a nice feature ^is it allows 
:u' to take the ecjuipment anywhere and cc:>nnect to any computer or computer 
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network without relying on the host computer to store voice patterns. The 
tape cartridge feature also allows one to have a tape available for each type 
of task one might do. Then, if one switches to a new task which requires 
several hundred utterances unique to that task, one simply loads another 
tape cartridge containing the voice patterns and commands for that task.) 

In this experiment, we also used the unbuffered mode which means that 
if the voice recognizer accepted a voice input, an ASCII character stream was 
immed lately sent to the host computer without any verification by the opera- 
tor that the voice recognizer had correctly Interpreted the voice input. 

This allows for the possibility that one might say one thing but the voice 
recognizer ’’thinks*' you said something else and therefore transmits the 
wrong ASCII stream. If an utterance is totally unacceptable, the voice 
recognizer just beeps. We could have guaranteed absolutely no input errors 
to the host computers if we had used the buffered mode which simply displays 
up to 128 utterances in series on a CRT and does not transmit the ASCII 
stream of characters until the operator verifies the stream and gives per- 
mission to transmit to the host computer. 

In brief then, this voice recognition equipment allows for up to 256 

utterances and with each utterance is associated an ASCII output stream. 

The subject can speak as many utterances as he wishes, as long as there is a 
.1-second delay between utterances. During an utterance, one must speak 
continuously for up to 2 seconds, and the voice recognizer then looks for at 
least a .1 second pause which is a signal to the recognizer that the old 
utterance has ended and a new utterance may be coming. Therefore, in normal 
talking, the following works fine if a .1 second pause is inserted where 
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xiiL' j. . atecl : "Select a map of the Med (pause) Show all Russian submarines 

(pause). How much fuel do they have? (pause) What is their destination? 

(pause) . " 

For this experiment, each subject trained the voice recognizer 10 times 
I cr each of the utterances and was then told he could practice running the 
ARPANET with voice commands. He could practice as much or little as he 
wanted during the next week until he felt comfortable using voice input. 

Then he was to tell the experimenter he was ready to do the actual experiment. 

Subjects practiced from 1 to 8 hours with the average being 3.26 hours, 
iiijs is imjpt^rtant t o keep in mind now that the results which follow are 

on subjects who have used typing input to the ARPANET for hundreds of 
nours and have only used voice input for about 3 hours. 

\ • . XPERIMEXTAL PROCEDURE 

The experiment was run in the evening or on weekends so the load aver- 
age would be under 3 on the ARPANET hosts used. This, in fact, occurred with 
each of the 3 host computers used in the experiment. Two of the hosts were 
in southern California and one in Massachusetts. They were accessed from the 
NPS Terminal Interface Processor (TIP) located at NPS. 

Based on the initial typing ability test, subjects were split into 2 
groups called "SLOW" and "FAST" typers. The actual typing abilities ranged 
from 17 to 49 words per minute. 

The actual experiment required subjects to follow a specific step-by- 
step scenario of instructions which required them to access the ARPANET, log 
into host computers, read messages, send messages, check for new mail, read 
files, transfer files between host computers, delete files, and interconnect 
^•ost computers. The scenario was designed to take about 10 minutes to go 
cJirough its steps one time. This scenario can be found in Appendix II. 
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Ihe scenario was performed A times by each subject using voic:e input and 
A t inies using manual typing input. Half of the “SI.OW’* typers performed A 
trials through the scenario using typing input first, followed by A trials 
using voice input. The other half used voice input first followed by A trials 
using typing input. The ’’FAST” typing grouj) was likewise counter-balanced 
with half using voice first and half using typing first. 

A conceptual design for the experiment is shown in Figure 1. lliis is 
a three-factor nested design with repeated measures over trials. However, 
each subject is nested within only 1 of the typing ability conditions. 

VII. SECONDARY TASK 

In addition to performing the main task in tlie scenario set of instruc- 
tions as fast and accurately as possible, subjects were given a stack of 
civil aviation weather reports with a blank data sheet for each report. Wlien 
the subject had spare time between steps of the main scenario when the host 
computer might be transferring a file or something, the subject was to read 
the data sheet and record the appropriate data from the aviation weather 
report. For example, a data sheet might ask for runway visual range, fog 
conditions and cloud cover. Subject was to find the correct alpha-numeric 
information on the weather report and write it on the data sheet. Wlien done 
with one data sheet, he proceeded to the next one as soon as possible. The 
data sheets did not always ask for the same information and the weather 
reports had random alpha-numeric information on them to prevent any pattern 
of learning. 

After the experiment was finished i or each subject, they were given the 
same questionnaire they had taken about two weeks before concerning their 
opinions and views on manual typing input and voi('t‘ input. 
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VIII, DEPENDENT VARIABLES 



During all trials, the following were measured: 

1) Time to complete the sienario. 

2) Number of input c'ommand errors to the computer network. 

3) Niimber of characters transcribed correctly on the secondary task. 

Note: We were interested in the number of times the network was instructed 

to do something wrong. Tlutefore, on typing input for example, if a 
command input was typed in wrong, it was counted as one error, whether 
there was one or several actual keystrokes typed wrong. Similarly, 
for voice input, if a subject spoke' the wrong scenario command, the 
voice rec'ognizer may have rcc'ognized tl«e voice input correctly, but 
it would be a wrong c'ommand to the host and tlierefore was an error. 
Likewise, if the voice recognizer i ncc^ rrec' 1 1 y identified a voice 
input and sent out the wrong c'ommand, this was an error. We were not 
interested in detailed analysis of how many times one voice utterance 
might get confused witii another, i.e., the word **five" c'onfused with 
the word "nine,” etc. 

In addition, we iiad ranked data 1 rom the subjects on tlieir 'before and 
after' opinions on the questionnaire. The (juestlons were ranked on a scale 
from 1 (strong feeling fc^r manual input) to 7 (strong feeling for voice input) 
with 4 in the middle meaning neutral feeling between voice and typing input 
modes. These questions can be found in Appendix I . 

IX. RE SULTS 

A. Results for Scenario limes 

Figure 2 shows the* times t<iken to peilorm the* set of fictions in the 



sc eini r 1 o . 



ihle I shows tile st<itisti(dl lesults from the analysis of vari- 



ance cjn t lUK S . 



(An a levc'l of .03 had bc'en chosen in the original experi- 




in this paper, it will mean that tlu*re 



is only a 3c chance or less that we 



are wrong when we say there was a s i gn i f i'.M n t (.1 i f t e r c-m e 



i n c e I t a i n c'ond i - 



t i o n s . ) 
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TABLE I. Analysis of Variance for 
Scenario Times 



Source 


df 


MS 


F 


Between subjects 


23 






T (typing ability) 


1 


4.69 




Subj. w. groups 


22 


8.45 




Within subjects 








I (input method) 


1 


140.97 


45.33* 


T X I 


1 


4.21 


1.35 


I X subj. w. groups 


22 


3.11 




Tr (trials) 


3 


57.72 


190.50* 


T X Tr 


3 


.16 




Tr X subj . w. groups 


66 


.30 




I X Tr 


3 


2.09 


2.72 


T X I X Tr 


3 


.43 




I X Tr X subj. w. groups 


66 


.77 





* p < 



.01 



As can be seen in Figure 2, voice input was consistently faster than 
manual typing input by an average 17.5%. This is a statistically signifi- 
cant difference in favor of voice input and even more important when we con- 
sider the subjects had only used voice input for about 3 hours in their 
entire life. 

There was also a significant decrease in time over trials with both 
methods as indicated in Table I. A range test showed a significant improve- 
ment in time between each trial. We will never know if more trials would 
have improved performance even more. Four trials were initially chosen 
under each method of input, and as it turned out, the actual experimenta- 
tion time for each subject was about 2 hours which left most of the subjects 
quite fatigued and mentally exhausted. 

There was no difference in typing ability with respect to times. Both 
’’slow" and "fast” typers could consistently perform better using voice input. 

B. Results for Errors 

Figure 3 illustrates the errors input to the system. The ANOVA results 
in Table II indicate a significant difference in typing ability and the "slow" 
and "fast" ty[>ers are therefore illustrated separately in Figure 3. Under 
both manual typing and voice input methods, the "fast" typers consistently 
made mc>re errors than "slow" tvpcrs. 

Under manual typing input, this was evident to the experimenter because 
"slow" tvpers were generally slow but quite precise in what they typed. 
However, "fast" typers would "go like hell" and thus cause a series of errors 
all at once. This personal characteristic of the "fast" typers appears to 
carry over into their performance using voice input also, since Figure 3 
shows "fast" typers having consistently more errors than "slow" typers when 
using voice iu[^ut also. 
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TABLE II. Analysis of Variance for Errors 



Source 


df 


MS 


F 


Between Subjects 


23 






T (typing ability) 


1 


154.08 


5.64** 


Subj. w. groups 


22 


27.30 




Within Subjects 


168 






I (input method) 


1 


825.02 


64.51* 


T X I 


1 


15.19 


1.19 


I X subj. w. groups 


22 


12.79 


1 


Tr (trials) 


3 


96.33 


t 

13.11* I 


T X Tr 


3 


7.31 




Tr X subj . w. groups 


66 


7.35 




I X Tr 


3 


35.85 


5.21* ! 


T X I X Tr 


3 


.85 




I X Tr X subj . w. groups 


66 


6.88 





* p < .01 ** p < .05 
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I 

I Table II also shows an overall difference between voice input errors and 
lanual typing input errors, as illustrated in Figure 3. Typing input averaged 
83% more input command errors than did voice input. 

One will recall from the previous section that there was no difference 
n scenario times for ’’slow” versus ’’fast” typers. If this is considered in 
combination with errors, it appears tiiat any time improvement gained by "fast*' 
cypers is probably offset by their making more errors whicli requires more 
:lme for correcting input commands. Their scenario times are, therefore, 
similar to "slow" typers who don't do the scenario as fast, but also make 
fewer errors so spend less scenario time in correcting errors. 

I 

Table II also shows a significant difference in errors over trials. A 

!■ 

grange test indicated a significant decrease in errors from trial 1 to trial 2 
jto trial 3 over all conditions, but, on the average, trial 4 showed no improve- 
Iment from trial 3. Table II also shows a significant interaction between 
ftrials and input method which is due mainly to the effect between trials 3 

■and 4 where errors increased under typing input but decreased under voice 
input. (See Appendix IV for voice recognizer performance details.) 

|C. Results for Secondary Task 

I Figure 4 shows the number of characters transcribed correctly on the 

I 

secondary task using aviation weather report sheets. Since all subjects made 
so few errors on this task (five or less) the number of characters tran- 
sc'ribed is actually the number corrcs tlv transcribed minus the number incor- 
rectly transcribed. 

Table III indicates a significant dil ference in input methods. These 
results are shown in Figure 4 illustrating 2b. 07 more information w.is tran- 
scribed on the secondary task during voice input than during manual typing 
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TABLE III. Analysis of Variance for 
Characters Transcribed 
on the Secondary Task 



Source 


df 


MS 


F 


Between Subjects 


23 






T (typing ability) 


1 


30,451 .69 


1.68 


Subj. w. groups 


22 


18,684.15 




Within Subjects 


168 






I (input method) 


1 


101 ,292.19 


24.32* 


T X I 


1 


2,581.33 




I X Subj. w. groups 


22 


4,164.45 




Tr (trials) 


3 


46,599.58 


59.86* 


T X Tr 


3 


359.41 




Tr X Subj . w. groups 


66 


778.53 




I X Tr 


3 


1,137.69 


1 .09 


T X I X Tr 


3 


913.72 




I X Tr X subj. w. groups 


66 


1 ,045.41 





* p < .01 
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input. In addition, there was a significant increase over trials. A range 
test showed significant increases in characters transcribed from trial 1 to 
trial 2 to trial 3, but no difference between trials 3 and 4. 

D. Subjective Questionnaire Results 

The subjective opinions received from each subject provided "before*' and 
"after" data on the same questions. As described previously, these opinions 
were ranks on a scale from 1 to 7 and a nonparametr ic sign test (2 tailed; 
a = .10) was therefore used to test for any general shifts in subjects* answers. 

Subjects showed the following trends in their "before" and "after" feel- 
ings. The numbers following each item show the average response before and 
after, where the response scale was 1 for strong typing input feeling, 4 was 
a neutral feeling and 7 was a strong feeling for voice. 

a) Subjects showed a significant shift in opinion concerning ease of 
input. Before the experiment, they had a feeling voice would be easier than 
manual input of commands to the computer, and after they felt even more 
strongly that this was the case (avg. before = 4.58; avg . after = 6.13). 

b) With respect to whether they would be more frustrated using manual 
typing or voice input, subjects started out feeling manual typing would be 
more frustrating and felt even stronger about this after (avg. before = 3.42; 
avg. alter ~ 2.63). 

c) After the experiment, subjects felt more strongly that voice input 
allowed one more time and freedom to do other things than did manual typing 
input (avg. before = 5.88; avg. after = 6.63). 

d) Subjects also started out with a feeling that voice input might allow 
more flexibility in entering items to a computer and after felt even stronger 
about this (avg. before = 3.92; avg. after = 4.58). This author had thought 
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I they would think manual input was more flexible. However, since the vocabu- 
lary of utterances for each subject included all the single digits and the 
entire military alphabet, they actually had a lot of flexibility with voice 
also. For example, to ‘'Forward” a message, the message system required an 
‘‘F” to be input. They could simply say ”Ft)rward message” which would trans- 
^ mit the ”F”, but many of tliem also used "Foxtrot” of the military alphabet 
I which also transmitted an ”F.” 

ii 

i e) Wlien asked if they would be more relaxed using manual typing or 

1 

J voice input, their response showed no statistical change. They started out 

I feeling they would be more relaxed with voice input and their feeling 

i remained that way, (avg. before = 5.00; avg . after = 5.67). 

Four questions were based on a scale from 1 to 7 with 1 meaning absolutely 
■ not, 4 meaning neutral and 7 meaning absolutely yes. These results were: 

I a) Subjects showed a significant change when asked if, in general, they 

liked the idea of voice input (avg. before = 6.00; avg. after = 6.50). They 
thought they would like it before and subsequently did. 

b) When asked if they would like to use voice input in everyday tasks, 

if it were applicable, they showed a similar significant ihange (avg. before = 
6.00; avg. after = 6.54). 

c) Wlien asked if voice input could be applicable in command and control 
tasks, subjects started out feeling quite positive and felt more strongly 
about this after the experiment (avg. before ~ 6.04; avg. after = 6.38). 

d) When asked if voice input could he used in military tasks other than 
command and control, they felt before tlie experiment that it could and retained 
this opinion after (avg. before ^ 6.00; avg. alter = 6.29). 

Finally, the questic:)n "Does voict* in[nit provide a better m<in-machine inter- 
face?” was asked only at the end oi the ex[)c‘ r i iiaui t . Wn the same ’absolutc'ly 
no” to "absolutely yes" seven-[)oint scale, tin* average subject response was 
toward "yes" with an average resj)onse of 5.80. 
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X. OTHER OBSERVATIONS 



A. There was no correlation between the amount of practice time 
subjects spent in becoming familiar with the voice recognition method 
and the fastest time in which they were able to perform the scenario 
using voice input. Likewise, there was no correlation between 
practice time and errors entered to the network. 

B. Voice input offers a better man-machine interface because the user 
can operate under conditions familiar for him. In the current 
experiment for example, a carriage return was required quite often. 
Each user could use the voice command most comfortable for him, and 
in the case of carriage return, some subjects used ’^return,” 

’’carriage return,” or ”go” while others chose ”do it,” "send it,” or 
"roger.” In a few cases, a subject even requested that he be able 

to use two different utterances which sent out the same ASCII stream 
of characters, so if he forgot one of the utterances during the 
stress of performing the experiment, he could use his alternate 
command just as easily. 

C. Voice input appears to reduce the problems of entering complicated 
strings of characters also. If a user needs to enter ”*/(LEN=) \\*” 
lie may make numerous mistakes in a manual keyboard entry mode, but 
with voice input, he can simply choose a phrase he likes to use 
and the above output ASCII stream is always the same and entered 
for him automatically. 
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D. Several subjects mentioned that with voice input they felt they 
had better command of the situation because they could see what 
the network was doing and at the same time their hands were free. 

With manual input they felt they were more at the mercy of the 
keyboard and concentrated more on typing the right characters rather 
than observing the big picture of what was going on. 

E. Our particular models of voice recognition equipment contain a 
structuring feature which allows one to operate on a subset of the 
total 256 utterances. By only operating on a subset of the utterances, 
one would get faster recognition times. However, it is this writer’s 
experience that structuring is not needed. Even when using all 

256 possible utterances in the memory of the voice recognizer, the 
response time is so fast that it is practically impossible for the 
user to notice any delay. We commonly use all 256 utterances, and 
in such cases, we can enter a voice command to the recognizer, and 
before one can blink an eye, a host computer hundreds of miles away 
is replying. Therefore we currently find it not necessary to use 
structuring of any kind, although it is a topic for future research. 

F. It is interesting to observe a behavioral phenomena when intro- 
ducing people to voice input also. This author often gives demon- 
strations of various software products in the NFS command center. 

I can literally make many mistakes in manual typing when running a 
particular demo, and people will accept my poor typing ability and 
be happy. However, v\^hen I use voice input, I might make one mistake 
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an hour, but the observers will immediately notice it and say 
something to the effect that voice input is nice, but is not perfect 
and has a way to go. That is true, but it is interesting to note 
that moments before I made all sorts of manual typing input mistakes 
and it did not bother themi 

G. We also found the portability of our units to be a nice feature. 
Since we do not depend on any foreign host computer to store the 
voice patterns, we can go anywhere with our units and be operational 
immediately. 

XI. CONCLUSIONS 

Based on the results of this experiment, 24 military officers were 

able to effectively operate a distributed computer network with minimal 

voice training. Considering that they already knew how to operate the 

network in a manual typing input mode, they were still able to operate the 

network faster using voice input, they made far fewer input errors with 

voice, and at the same time, managed to get 25% more work done on another 

task when using voice input than when using manual typing input. 

The results suggest that voice input may be a technology which can 

1 

be of benefit in command center operations, combat information centers and 
similar installations . 

Future and/or current plans for our experiments include examining: 

1) The use of voice input with military decision aids . 

2) The use of voice input with interactive graphics . 

3) The use of voice input by users during tactical computer games . 
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4) The use of voice input for human image interpreters 

5) The use of voice input in NATO type command centers where multi- 
lingual users are prevalent. Pilot experiments have indicated that 
for the 10 training passes used for each utterance, we can enter 5 
passes in English and 5 in German for a given utterance, and then 
the voice recognizer still appears to work quite well whether one 
speaks in English or German. If in fact we can make this work 
satisfactorily, we can effectively double the possible utterances 
from 256 to 512. 

6) The effect of shipboard and command renter environmental noises and 
disturbances on voice input. 

7) The effect of multi-task mental loading on an operator and his voice 
input performance . 

8) The amount of training required for effective use in various tasks. 
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APPENDIX I 



(Subjects were asked the following questions both before and after 
the experiment. Items 1 and 2 were yes or no responses. For Items 3 through 
7, subjects marked their choice on a scale from 1 to 7 where 1 was a very 
strong feeling for manual input, 4 was marked neutral feeling, and 7 was a very 
strong feeling for voice input. Verbs were changed appropriately for questions 
when asked after the experiment.) 

1. Have you used voice input before? 

2. Have you seen voice input used before? 

3. Which might be easier, manual typing input or voice input for communi- 
cating to a computer? 

4. Would you be more relaxed using manual typing input or voice input? 

5. Would you have more flexibility in entering items to a computer with 
voice input or manual typing input? 

6. Would voice input or manual typing allow you more time and freedom to 
do other things? 

7. Would you be more frustrated using voice input or manual typing? 

(On Items 8-11, subjects marked their choice on a scale from 1 to 7, where 1 
was "absolutely NO," 7 was "absolutely YES," and 4 was a neutral feeling.) 

8. In general, do you like the idea of voice input? 

9. In general, do you think you would like to use voice input in every day 
tasks yourself if it were applicable? 

10. In general, do you think voice input would be useful for application 
in command and control tasks. 

11. In general, do you think voice input could be used in military tasks 
other than command and control? 
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APPENDIX II 



SCENARIO INSTRUCTIONS 

1. GO TO HOST ISIE (host 116) 

2. See if there is MAIL for EXPERIMENTAL 

3. LOG INTO EXPERIMENTAL 

a) GET THE LOAD AVERAGE 

b) go into MSG 

s t 

c) Read the 1 — message 

d) FORWARD the 2— message to Poock 

e) Call the message "VOICE DEMO" 

NO CC: 

Don't add any new text 
Send it 

f) Exit to EXEC LEVEL 

g) Get the LOAD AVERAGE 

4. TELNET TO ISIC 

5. See if there is MAIL for C3DEMO 

6. LOG IN TO C3DEM0 

a) List all the directory files 

b) Type out the file beginning with a Z 

c) Go into MSG 

d) Read the 3^ message 

e) Exit back to EXEC LEVEL 

7 . LOGOUT 

8. DISCONNECT AND QUIT BACK TO EXEC LEVEL AT ISIE, 

9. GET THE LOAD AVERAGE 
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10. FTP to ISIC 

a) Log into C3DEM0 

b) List the C3DEM0 Directory on your TTY 

c) Get the remote file “LADDER. RUNFIL" to your local file 
“VOICE. RUNFIL*' 

d) Break the FTP connection and Disconnect and Quit. 

11. You are back at ISIE now 

a) Delete the file “VOICE. RUNFIL“ 

b) Go into MSG 

c) Send a message as follows: 

TO: POOCK 

CC: C3DEM0 

SUBJECT: Pacific Report 

MESSAGE: All Units Ready 

WX report = clear 

d) Send it 

e) Exit back to EXEC LEVEL 

12. Get the Load Average of the system 

13. TELNET to BBNA 

a) Log in as NPS 

b) List all the directory files 

14. LOGOUT of BBNA 

a) Disconnect and Quit Back to EXEC LEVEL at ISIE 

15. Get the LOAD AVERAGE 

16. LOGOUT. 
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EXAMPLE OF AN OBSERVATION AS FOUND ON HOURLY SEQUENCES 



APPENDIX III 



Decoding Avuitton V\e:Uht*r lU*[^orls Irorn C'lvil 



CO 

o 

CO 



SI 

CM 



00 

Qd 



CM 

Si 



CM 

LO 



CM 



lO 

CM 



h- 

CM 



REMARKS Visibility 


U.S. DEPARTMfNT OF COMMLFiCE 

Ay yy CNVIRONMENTAL SCitNCC StMvICtS ADMINISTRATION 

r( y f \*\ WEATMtR MUNI AIJ 

U\ /V SILVER SPRING. MO. 20910 


variable between 1/2 




and 1 mile. 


DECODING AVIATION Wh.ATTlEU REPORT'S 


R EMAHKS Ceiling 


UabL'd on In.st rue lions in I’cileral M< lom nl' filial Jf.uiclliook 
No. 1, Surface Obsorvatiotrs 


variable between ^00 




to 1200 feet. 


STAND.Mtl) AVIATION IU:pOHT FOUNMT lOk 


MANNi.1) b l'A I IONS 



BASES AND TOPS OF C I OLDS Tops broken Uiyer 2 700 fl . rnsl. Height of La ses nol 
vibiLle at the btat ion precede bky cover sy mhol . "L " inchc ales layer amount unknovwi. 
If the report is more than 15 minuleb old, the lime (GMT) precedes the entry. 



HEMAUKS Fog and Smoke hiding 3/10 of sky. 



RUNWAY VISUAL RANGE: Runway lOL, Visual Range variable between 2600 and 
5500 ft. in past 10 minutes. When visual range is constant for past 10 minutes, only 
the constant value is reported, e.g., RIOIA HGO^. 



ALTIMETER SETTING: 29.57 inches. Thiee figures, ri.‘]jre senl iiig unil.s, lentlis and 
hundredths of inches, indicate the a Itimelt r setting. “Low” is used jireceding figures 

to indicate values below 29.00 inches, 

Wl.N’D: 270® true, 13 kis. I'o decode direetion, multiply fust 2 digits by 10. If j)rc)duct 
IS ::^500, subtract 500 and add 100 to s[)eed. Gusts and squalls are indicated by "G” 
or * following Sfjoed and peak speed follov^ing the letter. 



T^ MIM HAI'URE: 66®F 



A minus sign indicates temperatures below zero. 



SI A LLA'El PHl.SSUHL; 1 0 1 4 . 6 mi Ihba i s . Only the tens, un i ts and tenths digits 
are repealed. 



WEATHER AND OBSTRUCTIONS TO VISION: Light Drizzle. Fog Smoke. S\mbols 
used in reporting weather and obstructions to vision are in Table 1. Algebraic .sigris 

(lable 1) follcjAinp syn.Lols indicatt^ intensity, 

PRf'VAII ING \‘ISlBILnV: Seven eighths statute mile and variable by the amount 
given in RE M A R KS . 



.SKY ^ CKll INCi Partly obsiured sky. ceiling mea sured M 00 ft . , va r la hh- br ol.r n , 
3 POO ft. overcast. Figures are height of each layer in 100s of feet above ground. .A 
number preceding an X indicates vertical visibi lity into j^henomena . A “V” indicates 
height varying by amount given in REMARKS. Symbol after height is amount of sky 
cover (T able 2). The letter preceding height indicates that height to be the ceiling and 
the method used to determine the height (Table 3). 



TYPE OF Rl.POFtT (Table 4): ''U“ omitted when observation is in hourly sequence. 
ST AT' ION IDf NTIFICATION: Identifies rejioi t for- Pi tlsbui gh by using FA A identifier. 
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Sample 



AVIATION WEATHER REPORT 
DATA SHEET 



REMARKS 

BASES AND TOPS OF CLOUDS 
RUNWAY VISUAL RANGE 
WIND 

TEMPERATURE 
PREVAILING VISIBILITY 
SKY AND CEILING 
STATION IDENTIFICATION 



(NOTE: A sample aviation weather report is shown on the previous page. 

A data sheet shown above was attached to each weather report. 

In the above case, subject would look for remarks on the report, 
copy down BC198 on the data sheet and proceed to the next item. 

The values on the aviation weather reports were all different and 
the items asked for on the data sheet were mixed up, i.e., sometimes 
a data sheet asked for WIND and other times not. When a weather 
report was done, subject went on to the next weather report and 
data sheet . ) 
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APPENDIX IV 



VOICE RECOGNIZER PERFORMANCE 
DETAILS IN OPERATIONAL EXPERIMENT 



Figure 3 in the text discusses input errors to the network. Although 
that Figure translates into a 3% error rate for voice, the data below show 
that actual performance of the recognizer in various categories. (If you 
say an utterance and the T600 does not recognize the utterance, then the 
T600 beeps and no ASCII output string is sent.) 

TOTAL UTTERANCES in this operational experiment were 7 , 200 
(i.e. 75 utterances per trial x 4 voice trials x 24 subjects). 

Recognizer Details : 

Category % of time 



1. 


Correct 


Utterance 


AND 


Correct Output 




96.80 


2. 


Correct 


Utterance 


AND 


Wrong Output 




.76 


3. 


Correct 


Utterance 


AND 


No Output (Beep) 




.36 


4. 


Invalid 


Utterance 


AND 


No Output (Beep) 




.78 


5. 


Invalid 


Utterance 


AND 


Recognizer Put Out 


Something 












When it Should Have 


Beeped 


1.30 



Items in 5 above were caused mostly by the inexperienced subjects mumbling 
and trying to figure out where they were under the time pressure of the 
scenario and the secondary task. 
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Although the total error rate for the recognizer is about 3% and 
that shown in Figure 3 is about 3%, one should note Figure 3 is input 
errors to the network. Therefore Figure 3 is based on the errors in 
Category 2 and Category 3 plus operational input errors, where the recognizer 
worked correctly, but the subject entered the wrong command to the network 
when a different command was required by the scenario. 
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APPENDIX V 



SUGGESTED VOCABULARY 

The following phrases were suggested but subjects could use their 
own phrase instead if they wished. The 180 utterance vocabulary was 
entirely open with no branching to subsets of words during the experiment 
The first one for example, GO TO ECHO, was 3 words spoken continuously 
to make one utterance, and similarly for the other phrases . 



GO TO ECHO 


C3 DEMO 


CONTROL ALPHA 


DELETE MESSAGE 


GET FILE 


DELETE FILE 


VOICE RUNFILE 


TYPE FILE 


LADDER RUNFILE 


BACKSPACE 


FORWARD MESSAGE 


SPACE 


VOICE DEMO 


CONTROL N 


STRAIT OF HORMUZ 


LOAD AVERAGE 


AIR ROUTES 


GARY POOCK 


RUSSIAN VERSION OF 


HORMUZ PACIFIC REPORT 


CLOSE OUT CHARLIE 


ALL UNITS READY 



GENISCO ZERO PARAMETERS DIRECTORY 



THREE MAPS 


TTY 


LEVEL TWO VIEWER 


ESCAPE 


MEDITERRANEAN MAP 


WEATHER REPORT 


NORTH ATLANTIC MAP 


REDSPHERE 


SOUTH ATLANTIC MAP 


CONNECT TO CHARLIE 


SMILE 


CHANGE DIRECTORY 


QUIT 


COLOR BLOCK 


TYPE MESSAGE 


CONTROL QUEBEC 


HEADERS 


TENEX 


SEND MESSAGE 


LOGIN NPS 


GO 


LOGIN NPS ONE 


DASH 


TELNET TO UNIX 


COMMA 


TELNET TO TENEX 
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PERIOD 



TELNET TO TOPS20 

CONTROL HOTEL 

COLOR BLOCK RUNFILE 

EXAMPLE RUNFILE 

CHANGE DIRECTORY TO POOCK 

SEARCH AND RESCUE 

LOAD GLD3 

LOAD THE SERVER 

LOAD THE GANN 

CONNECT TO ECHO 

CONTROL DELTA 

INFO MAIL EXPERIMENTAL 

MAIL CHECK C3DEMO 

TELNET 

DISCONNECT 

FTP 

GOODBYE 
ISI CHARLIE 
BBN ALPHA 
ISI ECHO 
MSG 
EXIT 

CONTROL 0 
CONTROL Z 
CONTROL CHARLIE 
LOGOUT 

LOGIN C3 DEMO 
LOGIN EXPERIMENTAL 
C2 NET CONTROL PASSWORD 
LOGIN C2 NET CONTROL 
GO TO BBN ALPHA 
LARRY SHACKLETON 
MAIL BOX 

GO TO SRI DASH KL 
SEMICOLON 



FROM 

ASTERISK 
UNDELETE FILE 
TEXT EDITOR 
MAIL STAT 
STAMMER 2 RUNFILE 
CONTROL BRAVO 
CONTINUE 
JACK WOZENCRAFT 
REX STOUT 
JACK DIETZLER 
SEND FILE 
ISI ALPHA 

POOCK NPS PASSWORD 

AC CAT BOX 

LOGIN ACCAT BOX 

ANSWER MESSAGE 

FORWARD MESSAGE 

NORTH 

EAST 

SOUTH 

WEST 

MY POSITION IS 
GO TO CHARLIE 
INFO DISK 
DISK STATUS 
WHOIS 

LOGIN XCNO 

WHARTON 

SLANT 

RECENT MESSAGES 
NOT EXAMINED 
LOGIN HOLLISTER 
R SCHLAFF 
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KNELMS 


DOWN IN DETAIL 


SEND 


ACCAT TITLE 


TRANSMIT 


MOVE IT DOWN 


MAIL CHECK STOUT 


MOVE IT UP 


AT 


MOVE IT LEFT 


ERASE 


MOVE IT RIGHT 


CANCEL 


BREAK 


CLOSE CONNECTION 


SPIROGRAPH 


wm 


USE THAT ONE 


CONTROL TANGO 


LEVEL TWO 


SPHERE 


GRAPHICS 


UP IN DETAIL 


MARBLES 



PLUS the 10 digits and the 26 word military alphabet were also included 
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